Power Spectra Estimation 



1.0 INTRODUCTION 

Perhaps one of the more important application areas of digi- 
tal signal processing (DSP) is the power spectral estimation 
of periodic and random signals. Speech recognition prob- 
lems use spectrum analysis as a preliminary measurement 
to perform speech bandwidth reduction and further acoustic 
processing. Sonar systems use sophisticated spectrum 
analysis to locate submarines and surface vessels. Spectral 
measurements in radar are used to obtain target location 
and velocity information. The vast variety of measurements 
spectrum analysis encompasses is perhaps limitless and it 
will thus be the intent of this article to provide a brief and 
fundamental introduction to the concepts of power spectral 
estimation. 
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Since the estimation of power spectra is statistically based 
and covers a variety of digital signal processing concepts, 
this article attempts to provide sufficient background 
through its contents and appendices to allow the discussion 
to flow void of discontinuities. For those familiar with the 
preliminary background and seeking a quick introduction 
into spectral estimation, skipping to Sections 6.0 through 
1 1 .0 should suffice to fill their need. Finally, engineers seek- 
ing a more rigorous development and newer techniques of 
measuring power spectra should consult the excellent refer- 
ences listed in Appendix D and current technical society 
publications. 

As a brief summary and quick lookup, refer to the Table of 
Contents of this article. 
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2.0 WHAT IS A SPECTRUM? 

A spectrum is a relationship typically represented by a plot 
of the magnitude or relative value of some parameter 
against frequency. Every physical phenomenon, whether it 
be an electromagnetic, thermal, mechanical, hydraulic or 
any other system, has a unique spectrum associated with it. 
In electronics, the phenomena are dealt with in terms of 
signals, represented as fixed or varying electrical quantities 
of voltage, current and power. These quantities are typically 
described in the time domain and for every function of time, 
f(t), an equivalent frequency domain function F(co) can be 
found that specifically describes the frequency-component 
content (frequency spectrum) required to generate f(t). A 
study of relationships between the time domain and its cor- 
responding frequency domain representation is the subject 
of Fourier analysis and Fourier transforms. 
The forward Fourier transform , time to frequency domain, of 
the function x(t) is defined 



and the inverse Fourier transform, 
main, of X(a>) is 



frequency to time do- 



F-1[X(co)] 



2tt J - 



X(a>)d«t do) = x(t) 



(2) 



(For an in-depth study of the Fourier integral see reference 
19.) Though these expressions are in themselves self-ex- 
planatory, a short illustrative example will be presented to 
aid in relating the two domains. 

If an arbitrary time function representation of a periodic 
electrical signal, f(t), were plotted versus time as shown in 
Figure 1 , its Fourier transform would indicate a spectral con- 
tent consisting of a DC component, a fundamental frequen- 
cy component a> , a fifth harmonic component 5<o and a 
ninth harmonic component 9« (see Figure 2). It is illustra- 
tively seen in Figure 3 that the superposition of these fre- 
quency components, in fact, yields the original time function 
f(t). 



F[x(t)] = 



x(t)e -]<■>* dt = X(co) 
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FIGURE 1. An Electrical Signal f(t) 
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FIGURE 2. Spectral Composition or 
Spectrum F(a>) or f(t) 
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FIGURE 3. Combined Time Domain and Frequency Domain Plots 



3.0 ENERGY AND POWER 

In the previous section, time and frequency domain signal 
functions were related through the use of Fourier trans- 
forms. Again, the same relationship will be made in this sec- 
tion but the emphasis will pertain to signal power and ener- 
gy- 

Parseval's theorem relates the representation of energy, 
o)(t), in the time domain to the frequency domain by the 
statement 

Too Too 

u(t)= f 1 (t)f 2 (t)dt= F^OFaffldf P) 

J - oo J - oo 

where f(t) is an arbitrary signal varying as a function of time 
and F(t) its equivalent Fourier transform representation in 
the frequency domain. 
The proof of this is simply 

Too Too 

j _ ^ f 1 (t)f 2 (t) dt = J _ ^ f 1 (t)f 2 (t) dt (4a) 

Letting F-i(f) be the Fourier transform of f-i(t) 
| fi(t)f 2 (t)dt = J F-i (f)ei2-n-ft df f 2 (t)dt ( 4b > 

| F^f) e j277-ftdf f 2 (t) dt < 4c ) 

Rearranging the integrand gives 

| " f 1 (t)f 2 (t)dt= * F^f) J * f 2 (t)ei2irft dt df ( 4d ) 

and the factor in the brackets is seen to be F 2 (-f) (where 
F 2 (-f) = F 2 * (f) the conjugate of F 2 (f) so that 



Too Too 

I f 1 (t)f 2 (t)dt= F^OFgf-fJdf 

J - oo J - oo 

try to this theon 
F*(f), the com| 

Too Too 

j _j2(t)dt = J_ oo F(f)F*(f)df 



(4e) 



A corollary to this theorem is the condition f-| (t) = f 2 (t) then 
F(-f) = F*(f), the complex conjugate of F(f), and 



r 



|F(f)|2df 



(5a) 



(5b) 



This simply says that the total energy 1 in a signal f(t) is 
equal to the area under the square of the magnitude of its 
Fourier transform. | F(f)| 2 is typically called the energy densi- 
ty, spectral density, or power spectral density function and 
|F(f)| 2 df describes the density of signal energy contained in 
the differential frequency band from f to f + dF. 
In many electrical engineering applications, the instanta- 
neous signal power is desired and is generally assumed to 
be equal to the square of the signal amplitudes i.e., f 2 (t). 

'Recall the energy storage elements 



<o(t). 



Capacitor 



W (t). 



"T di 

L — idt - 
o dt 



vc — dt = cvdv = - cv 2 
) dt Jo 2 



This is only true, however, assuming that the signal in the 
system is impressed across a id resistor. It is realized, for 
example, that if f(t) is a voltage (current) signal applied 
across a system resistance R, its true instantaneous power 
would be expressed as [f(t)] 2 /R (or for current [f(t)] 2 R) 
thus, [f(t)] 2 is the true power only if R = 111. 
So for the general case, power is always proportional to the 
square of the signal amplitude varied by a proportionality 
constant R, the impedance level in a circuit. In practice, 
however, it is undesirable to carry extra constants in the 
computation and customarily for signal analysis, one as- 
sumes signal measurement across a 1 fl resistor. This al- 
lows units of power to be expressed as the square of the 
signal amplitudes and the units of energy measured as 
volts 2 -second (amperes 2 -second). 

For periodic signals, equation (5) can be used to define the 
average power, P aV g. over a time interval t 2 to t-| by integrat- 
ing [f(t)] 2 from ti to t 2 and then obtaining the average after 
dividing the result by t 2 — t-| or 



Pavg 



1 



t 2 - ti J tl 



r 2 f2(t> dt 

Ju 



tJo 



f 2 (t) dt 



(6a) 



(6b) 



where T is the period of the signal. 

Having established the definitions of this section, energy 

can now be expressed in terms of power, P(t), 

Too 

«(t) = [f(t)] 2 dt ( 7a > 

J -oo 
Too 

P(t) dt ( 7 &) 

J -oo 

with power being the time rate of change of energy 

P»-*? (8) 

dt 

As a final clarifying note, again, |F(f)| 2 and P(t), as used in 
equations (7b) and (8), are commonly called throughout the 
technical literature, energy density spectral density, or pow- 
er spectral density functions, PSD. Further, PSD may be 
interpreted as the average power associated with a band- 
width of one hertz centered at f hertz. 

4.0 RANDOM SIGNALS 

It was made apparent in previous sections that the use of 
Fourier transforms for analysis of linear systems is wide- 
spread and frequently leads to a saving in labor. 
In view of using frequency domain methods for system anal- 
ysis, it is natural to ask if the same methods are still applica- 
ble when considering a random signal system input. As will 
be seen shortly, with some modification, they will still be 
useful and the modified methods offer essentially the same 
advantages in dealing with random signals as with nonran- 
dom signals. 

It is appropriate to ask if the Fourier transform may be used 
for the analysis of any random sample function. Without 
proof, two reasons can be discussed which make the trans- 
form equations (1) and (2) invalid. 



Firstly, X(w) is a random variable since, for any fixed a, 
each sample would be represented by a different value of 
the ensemble of possible sample functions. Hence, it is not 
a frequency representation of the process but only of one 
member of the process. It might still be possible, however, 
to use this function by finding its mean or expected value 
over the ensemble except that the second reason netages 
this approach. The second reason for not using the X(o>) of 
equations (1) and (2) is that, for stationary processes, it al- 
most never exists. As a matter of fact, one of the conditions 
for a time function to be Fourier transformable is that it be 
integrable so that, 

Too 

|x(t)| dt < oo < 9 ) 

J -oo 

A sample from a stationary random process can never satis- 
fy this condition (with the exception of generalized functions 
inclusive of impulses and so forth) by the argument that if a 
signal has nonzero power, then it has infinite energy and if it 
has finite energy then it has zero power (average power). 
Shortly, it will be seen that the class of functions having no 
Fourier integral, due to equation (9), but whose average 
power is finite can be described by statistical means. 
Assuming x(t) to be a sample function from a stochastic 
process, a truncated version of the function xj(t) is defined 
as 

f x(t) 1 1 1 < T 

xt«=L M. T no) 



|t|>T 



and 



x(t) = lim x T (t) 



(11) 



This truncated function is defined so that the Fourier trans- 
form of xy(t) can be taken. If x(t) is a power signal, then 
recall that the transform of such a signal is not defined 



but that 



|x(t)| dt not less than • 



|x T (t)| dt < 



(12) 



(13) 



The Fourier transform pair of the truncated function x-r(t) 
can thus be taken using equations (1) and (2). Since x(t) is a 
power signal, there must be a power spectral density func- 
tion associated with it and the total area under this density 
must be the average power despite the fact that x(t) is non- 
Fourier transformable. 
Restating equation (5) using the truncated function x-r(t) 



Too r< 

and dividing both sides by 2T 

- n x^t)dt=-p 

2Tj -oo tw 2Tj - 



|X T (f)|2 df 



|X T (f)l 2 df 



(14) 



(15) 



gives the left side of equation (15) the physical significance 
of being proportional to the average power of the sample 
function in the time interval — T to T. This assumes x-r(t) is a 
voltage (current) associated with a resistance. More pre- 
cisely, it is the square of the effective value of x-r(t) 



and for an ergodic process approaches the mean-square 
value of the process as T approaches infinity. 
At this particular point, however, the limit as T approaches 
infinity cannot be taken since Xy(f) is non-existent in the 
limit. Recall, though, that X-r(f) is a random variable with 
respect to the ensemble of sample functions from which x(t) 
was taken. The limit of the expected value of 



1 
2T 



|x T (f)l : 



can be reasonably assumed to exist since its integral, equa- 
tion (15), is always positive and certainly does exist. If the 
expectations E( ), of both sides of equation (15) are taken 



&r. 



xf(t) dt 



fei: 



|x T (f)|2df (16) 



then interchanging the integration and expectation and at 
the same time taking the limit as T — > °° 



lim 

T — 



results in 



2Tj- 



x2 (t) dt 



Jim 

2: 



(17) 



E(|x T (f)|2)df 



< x2 (t) > 



E(|x T (1 



L df (18) 



x2 (t) - 



-df 



(19) 



. .«T->» 2T 
where x 2 (t) is defined as the mean-square value ( de- 
notes ensemble averaging and < > denotes time averag- 
ing). 

For stationary processes, the time average of the mean- 
square value is equal to the mean-square value so that 
equation (18) can be restated as 

| im E{|X T (f)|2 l 

, T -*■ °° 2T 
The integrand of the right side of equation (19), similar to 
equation (5b), is called the energy density spectrum or pow- 
er spectral density function of a random process and will be 
designated by S(f) where 

S(f) = !im_lM! (20) 

Recall that letting T — > °° is not possible before taking the 
expectation. 

Similar to the argument in Section III, the physical interpre- 
tation of power spectral density can be thought of in terms 
of average power. If x(t) is a voltage (current) associated 
with a 1fi resistance, x 2 (t) is just the average power dissi- 
pated in that resistor and S(f) can be interpreted as the 
average power associated with a bandwidth of one hertz 
centered at f hertz. 

S(f) has the units volts 2 -second and its integral, equation 
(19), leads to the mean square value hence, 

— r°° 

x2(t) = S(f)df (21) 



Having made the tie between the power spectral density 
function S(f) and statistics, an important theorem in power 
spectral estimation can now be developed. 
Using equation (20) and recalling that Xj(f) is the Fourier 
transform of x-r(t), assuming a nonstationary process, 



s(f) = !f m _ 



S(f) 



lim 
T — 



E{|X T (f)|2) 
2T 



XTtt^'ldt! 
x T (t 2 )e itut2 dt 2 



(22) 



(23) 



Note that |Xj(f)| 2 = X T (f)X T (-f) and that in order to distin- 
guish the variables of integration when equation (23) is re- 
manipulated the subscripts of ti and t 2 have been intro- 
duced. So, (see Appendix B) 

S(f)=T IT U=o(i (24) 



lim 
T — 



2T 
dt 2 j" 

1 
2T 



-jo)(t 2 -ti) 



r» d,2 i- 



x T (ti)x T (t 2 ) dt! 



EIxTtt^xjfe)] (25) 
£ -Mt 2 -t 1 ) dti 



Finally, the expectation E[xT(ti)xj(t 2 )] is recognized as the 
autocorrelation, R X x(t-|, (2) (Appendix A.14) function of the 
truncated process where 

E[XT(ti)XT(t2)] = R xx (tl. t 2 ) I ti I, 1 1 2 | < T 



= 


everywhere else. 


Substituting 




t2 - h = r 


(26) 


dt 2 = dT 


(27) 


equation (25) next becomes 




S^f^J^dx 


IOR\ 



S(f) 



.Rxx(ti.ti+T)e-iwdt 

1 
"2T 



lim 



Rxxfti.ti + rjdt! ri»t dr 



(29) 



We see then that the special density is the Fourier transform 
of the time average of the autocorrelation function. The rela- 
tionship of equation (29) is valid for a nonstationary process. 



For the stationary process, the autocorrelation function is 
independent of time and therefore 

<Rxx(tl.tl, + T)> = R xx (t) (30) 

From this it follows that the spectral density of a stationary 
random process is just the Fourier transform of the auto- 
correlation function where 



S(f) 



RxxMc-i^dr 



and 



(31) 



(32) 



RxxM= S(f)«i«tdf 

j — 00 

are described as the Wiener-Khintchine theorem. 
The Wiener-Khintchine relation is of fundamental impor- 
tance in analyzing random signals since it provides a link 
between the time domain [correlation function, R X x( r )] and 
the frequency domain [spectral density, S(f)]. Note that the 
uniqueness is in fact the Fourier transformability. It follows, 
then, for a stationary random process that the autocorrela- 
tion function is the inverse transform of the spectral density 
function. For the nonstationary process, however, the auto- 
correlation function cannot be recovered from the spectral 
density. Only the time average of the correlation is recover- 
able, equation (29). 

Having completed this section, the path has now been 
paved for a discussion on the techniques used in power 
spectral estimation. 

5.0 FUNDAMENTAL PRINCIPLES OF 
ESTIMATION THEORY 

When characterizing or modeling a random variable, esti- 
mates of its statistical parameters must be made since the 
data taken is a finite sample sequence of the random pro- 
cess. 

Assume a random sequence x(n) from the set of random 
variables fx n ) to be a realization of an ergodic random pro- 
cess (random for which ensemble or probability averages, 
E[ ], are equivalent to time averages, < >) where for all n, 

(33) 



[XrJ = \[ 



xf x (x) dx 



Assuming further that the estimate of the desired averages 
of the random variables {x n ) from a finite segment of the 
sequence, x(n) for < n < N-1, to be 

N 



m 



<Xn> 



N — > 00 2N 
then for each sample sequence 

m = <x(n)> = jy 



ITT Z Xn 



1 2N 



TT £ x(n > 

n = -N 



(34) 



(35) 



Since equation (35) is a precise representation of the mean 
value in the limit as N approaches infinity then 

N - 1 



^Z x(n) 



(36) 



may be regarded as a fairly accurate estimate of m for suffi- 
ciently large N. The caret (A) placed above a parameter is 
representative of an estimate. The area of statistics that 
pertains to the above situations is called estimation theory. 
Having discussed an elementary statistical estimate, a few 
common properties used to characterize the estimator will 
next be defined. These properties are bias, variance and 
consistency. 

Bias 

The difference between the mean or expected value E[ij] of 
an estimate -q and its true value -q is called the bias. 



bias = B » = ii - E[ij] 



(37) 



Thus, if the mean of an estimate is equal to the true value, it 
is considered to be unbiased and having a bias value equal 
to zero. 

Variance 

The variance of an estimator effectively measures the width 
of the probability density (Appendix A.4) and is defined as 



V 



(r, - E[t)])2 



(38) 



A good estimator should have a small variance in addition to 
having a small bias suggesting that the probability density 
function is concentrated about its mean value. The mean 
square error associated with this estimator is defined as 



mean square error = E (t; — t;) 2 



B^ (39) 



Consistency 

If the bias and variance both tend to zero as the limit tends 
to infinity or the number of observations become large, the 
estimator is said to be consistent. Thus, 



and 



lim 
N- 



lim 
N- 



B 



V 











(40) 



(41) 



This implies that the estimator converges in probability to 
the true value of the quantity being estimated as N becomes 
infinite. 

In using estimates the mean value estimate of m x , for a 
Gaussian random process is the sample mean defined as 
N - 1 



(42) 



ntv = — > x, 
x N Z^ ' 

i = 

the mean square value is computed as 


E[ ^ ] = ri X Z E[XiXj] 

i =0 j = 


1 

~ N2 


N - 1 N - 1 N - 1 
i = i = j = 

L i*i 



Ebql'Elxj] 



(43) 



(44) 



1 



N(E[x 2 ]) 




E[x;] 




EM 



1,2, „ N - 1 

= -E[x 2 ] + (m x )2 — 

thus allowing the variance to be stated as 



E[(m x )2] - (E[m x ])2 



1 o o N - 1 

-E[x 2 ] + (m x ) 

N *' N 



(E[m x ]|2 



1,2, 2 N - 1 

-E[x 2 ] + (m 2 ) — 



-(E[x 2 J -m 2 x ) 



N 



(45) 

(46) 

(47) 
(48) 
(49) 
(50) 
(51) 



This says that as the number of observations N increase, 
the variance of the sample mean decreases, and since the 
bias is zero, the sample mean is a consistent estimator. 
If the variance is to be estimated and the mean value is a 
known then 



N - 1 

'- Ye, 

N L^ 



N 



m x )2 



(52) 



i = 



this estimator is consistent. 

If, further, the mean and the variance are to be estimated 
then the sample variance is defined as 
N - 1 



1 Y 

N Z^ 



(x, - m x )2 



(53) 



again m x is the sample mean. 

The only difference between the two cases is that equation 
(52) uses the true value of the mean, whereas equation (53) 
uses the estimate of the mean. Since equation (53) uses an 
estimator the bias can be examined by computing the ex- 
pected value of <r x therefore, 
N 1 



E[cr x ] 



± jT (E[ Xi l - E[m x ])2 

i = 
N - 1 ( 

ill* 

i = I 

;2>-,iX(i 



f] - 2E[Xim x ] + Elm 2 ] 



N - 1 /N - 1 



j = 



(54) 



(55) 



(56) 



N - 1 N - 1 

1 V V 



E[xix i ] ) + ^2. 2. E[XiXjl 

i = j = 



N - 1 /N-1 

E[xf]-|| > Llxf] 
\ i = 



i = \ i = 

N - 1 N - 1 \ 

Z Z E[Xii,E[x i ] j + i(Z 

N-1 N - 1 



the expected value 



= o j = o 



(57) 



N - 1 N - 1 
Lixfl > > E[ Xi ]»E[xj] 

= j = 



1 f N«E[xf] J - j^ | N*L!x;'] N(N 1 )m; 



+ — | N«Etxf] + N(N - 1)m; 



1(n 

NV 


•E[xf 


)" 


S^ 


\ 2N(N - 1) 
) N2 


m 2 . 






+ 


><> + 


N(N - 
N2 


D 2 

m x 




l( N .E[xf 


)" 


^(Etxfn- 


2(N - 

N 


1) m 2 








+ 


l E[ xf ] + 


(N-1) 

N 


m 2 

m x 




* (N 


-E[x 


2 1)- 


; ( E[xf])- 


(N- 
N 


D 2 




(N- 
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Vf 


<fl) 


(N-1) 
rr 
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x 






(N- 


D 2 













N 



(58) 



(59) 



(60) 



(61) 



(62) 



(63) 



It is apparent from equation (63) that the mean value of the 
sample variance does not equal the variance and is thus 
biased. Note, however, for large N the mean of the sample 
variance asymptotically approaches the variance and the 
estimate virtually becomes unbiased. Next, to check for 
consistency, we will proceed to determine the variance of 
the estimate sample variance. For ease of understanding, 
assume that the process is zero mean, then letting 

N-1 



so that, 



1 y 

N Z^ 



: 
N N 



E[ * 2] -^X X E[X?X * ] 



(64) 



(65) 



j^[NE[x 4 n ] + N(N-1)(E[x 2 n ])2| (66) 



[E[x 4 n ] + (N - 1)(E[x 2 n ])2J (67) 



E[v|/] = E[x3 



so finally 
var[> 2 ] = E[v|/2] - (E[v|/])2 

1 



N 



E[x 4 ] - (E[x 2 ]2 



(68) 

(69) 
(70) 



Re-examining equations (63) and (70) as N becomes large 
clearly indicates that the sample variance is a consistent 
estimate. Qualitatively speaking, the accuracy of this esti- 
mate depends upon the number of samples considered in 
the estimate. Note also that the procedures employed 
above typify the style of analysis used to characterize esti- 
mators. 

6.0 THE PERIODOGRAM 

The first method defines the sampled version of the Wiener- 
Khintchine relation, equations (31) and (32), where the pow- 
er spectral density estimate Sn (f) is the discrete Fourier 
transform of the autocorrelation estimate R^ (k) or 



s Nvv (f) 



X rn * 



(k) £ -iwkT 



(71) 



k = 



This assumes that x(n) is a discrete time random process 
with an autocorrelation function Rn xx (I<). 
For a finite sequence of data 

fx n forn = 0,1 N-1 

1 elsewhere 
called a rectangular data window, the sample autocorrela- 
tion function (sampled form of equation A.14-9) 



x(n) 



(72) 



R N xx ( k ) = jij X x(n)x(n + k) 



(73) 



can be substituted into equation (71) to give the spectral 
density estimate 

SN xx (f) = J|X N (f)l 2 

called the periodogram. 

Note. |XN(f)|2 - XN(f)XN, W - 



(74) 



XN 2 (f)r< 



XN 2 (f), 



i mag 



Hence, 



s Nxx (f) 



F R Nxx (k) = F E[x(n)x(n + k)] 



FiN xx (k)€-i» kT 



I 



X h X x(n)x<n ^ 



(75) 



-jcokT (7 6 ) 



SO letting 1 = d« nT e-jonT 



s Nxx (t) 



jwnT 



Z x(n: 

= — oo 

y x(n + k) 



(77) 



-ja)(n + k)T 



k = - GO 

and allowing the variable m = n + k 



SN xx (f) = ^X N (f)X N *(f) = - 
in which the Fourier transform of the signal is 



|X N (f)P 



X N (f) : 



I 



x(n) e-JwnT 



(78) 



(79) 



The current discussion leads one to believe that the perio- 
dogram is an excellent estimator of the true power spectral 
density S(f) as N becomes large. This conclusion is false 
and shortly it will be verified that the periodogram is, in fact, 
a poor estimate of S(f). To do this, both the expected value 
and variance of S|\| xx (f) will be checked for consistency as N 
becomes large. As a side comment it is generally faster to 
determine the power spectral density, S|yj xx (f), of the ran- 
dom process using equation (74) and then inverse Fourier 
transforming to find RN xx (k) than to obtain Rn xx C<) directly. 
Further, since the periodogram is seen to be the magnitude 
squared of the Fourier transformed data divided by N, the 
power spectral density of the random process is unrelated 
to the angle of the complex Fourier transform X^ff) of a 
typical realization. 

Prior to checking for the consistency of SN xx (f), the sample 
autocorrelation must initially be found consistent. Proceed- 
ing, since the sample autocorrelation estimate 

RN xx (k) = (80) 

x(0)x(k) + x(1)x(|k| + 1) + ... + x(N-1-|k|)x(N-1) 

N 



N-1-|k| 

= i\i X x(n)x(n + |k|) 

n = 
k = 0, ±1, ±2,..., +N - 1 



(81) 



which averages together all possible products of samples 
separated by a lag of k, then, the mean value of the esti- 
mate is related to the true autocorrelation function by 
isl — 1 — k| 

L[R Nxx< k > ] I h ^ E[x(n)x(n + |k|)] 



N 



(82) 



N~| k| 
N 



R(k) 



where the true autocorrelation function R(k) is defined as 
(the sample equivalent of equation A.14-8) 

R(k) = E[x(n)x(n + k)] (83) 



From equation (82) it is observed that RN xx (k) ' s a biased 
estimator. It is also considered to be asymptotically unbi- 
ased since the term 

N-lkl 



N 
approaches 1 as N becomes large. From these observa- 
tions R|\i xx (k) can De classified as a good estimator of R(k). 
In addition to having a small bias, a good estimator should 
also have a small variance. The variance of the sample au- 
tocorrelation function can thus be computed as 

var[R Nxx (k)] = E[R 2 N>ix (k)] - E2[R Nxx (k)] (84) 

Examining the E[Rn (k)] term of equation (84), substituting 
the estimate of equation (81) and replacing n with m, it fol- 
lows that 

N -1 - Ikl 



E[R 2 N „(k)] = E 



1 
N2 



r r N -1 - |k| 

\ \ X x<n)x(n 

y\- n = 

N - 1 - 

N A^ 

m = C 

1 - |k| 

I I 

n = m = 



N - 1 - |k| 

x(m)x(m + 

m = 
N - 1 - Ikl N - 1 - Ikl 



(85) 



(86) 



E[x(n)x(n + |k|)x(m)x(m + 



>) 



If the statistical assumption that x(n) is a zero-mean Gauss- 
ian process, then the zero-mean, jointly Gaussian, random 
variables symbolized as X-i, X2, X3 and X4 of equation (86) 
can be described as [Ref. (30)]. 
E[XiX 2 X 3 X 4 ] = E[X-,X 2 ] E[X 3 X 4 ] + ElX^g] E[X 2 X 4 ] 

+ E[X-,X 4 ] E[X 2 X 3 ] < 87 > 
= E[x(n) x(n + |k|)] E[x(m) x(m + |k|)] 
+ E[x(n)x(m)]E[x(n + |k|)x(m + |k|] (88) 
+ E[x(n)x(m + |k|)] E[x(n + |k|)x(m)] 



Using equation (88), equation (84) becomes 

N - 1 - |k| N - 1 - |k 



— 1 — K IN — \ — 

I I 



- m = (89) 

RN xx (k)RN xx (k) + Rn xx (" - m) R Nxx (n - m) 

+ R N XX (n - m - |k|) R Nxx (n - m + |k|; 

j!j X RN >«< (k) 

L n = 

N — 1 — Ikl N — 1 — Ikl 



Var[R N xy (k)] 



1 

N2 



Z Z j Rl,jn 



n = m = 

Rn»> - m ~ k)RN yx ( n - m 



m) 
(90) 



where the lag term n — m was obtained from the lag differ- 
ence between t = n — m = (n + k) — (m + k) in the 
second term of equation (88). The lag term n — k + m and 
n — k — m was obtained by referencing the third term in 
equation (88) to n, therefore for 

E[x(n)x(m + |k|)] (91) 

the lag term t = n - (m + |k|) so 

E[x(n) x(m + |k|)] = R Nxx (n - m + |k|) (92) 

and for 

E[x(n + |k|)x(m)] (93) 

first let n - m then add |k| so t = n - m + |k| and 

E[x(n + |k|)x(m)] = R Nxx (n - m + |k|) (94) 

Recall that a sufficient condition for an estimator to be con- 
sistent is that its bias and variance both converge to zero as 
N becomes infinite. This essentially says that an estimator is 
consistent if it converges in probability to the true value of 
the quantity being estimated as N approaches infinity. 
Re-examining equation (90), the variance of RN xx (k), and 
equation (82), the expected value of RN xx (k), it is found that 
both equations tend toward zero for large N and therefore 
RN xx (k) is considered as a consistent estimator of R(k) for 
fixed finite k in practical applications. 
Having established that the autocorrelation estimate is con- 
sistent, we return to the question of the periodogram con- 
sistency. 

At first glance, it might seem obvious that S|\| xx (f) should 
inherit the asymptotically unbiased and consistent proper- 
ties of Rn (k), of which it is a Fourier transform. Unfortu- 
nately, however, it will be shown that S|\| xx (f) does not pos- 
sess these nice statistical properites. 
Going back to the power spectral density of equation (71). 

S N xx (f>= X ^xxM^ 1 

k = -oo 

and determining its expected value 



E[S N „(f)] 



I 



E[R Nxx (k)] £-i«*T (95) 



k = -GO 

the substitution of equation (82) into equation (95) yields the 
mean value estimate 

N 
E[S Nxx (f)]= V R < k > ( 1 ~ N ) €_ia>kT (96) 



the 1 



N 



term of equation (96) can be interpreted as a(k), the triangu- 
lar window resulting from the autocorrelation of finite-se- 
quence rectangular-data window co(k) of equation (72). 
Thus, 

|k| 



a(k) = 



1 



N 



|k| < N - 1 (97a) 
elsewhere (97b) 



and the expected value of the periodogram can be written 
as the finite sum „ 



E[S N m] 



I 



RNwM a(k) e-i»kT (98) 



Note from equation (98) that the periodogram mean is the 
discrete Fourier transform of a product of the true auto- 
correlation function and a triangular window function. This 
frequency function can be expressed entirely in the frequen- 
cy domain by the convolution integral. From equation (98), 
then, the convolution expression for the mean power spec- 
tral density is thus, 



E[S N m] 



J -V: 



S^Aff-^dT, 



(99) 



where the general frequency expression for the transformed 
triangular window function A(f) is 



A(f) 



sin (27rf) - 
2 

. (27Tf) 



(100) 



Re-examining equation (98) or (96) it can be said that equa- 
tion (71) or (74) gives an asymptotically unbiased estimate 
of S(f) with the distorting effects of a(k) vanishing as N 
tends toward infinity. At this point equation (98) still appears 
as a good estimate of the power spectral density function. 
For the variance var [Sn (f)] however, it can be shown 
[Ref. (10)] that if the data sequence x(n) comes from a 
Gaussian process then the variance of S|\| xx (f) approaches 
the square of the true spectrum, S 2 (f), at each frequency f. 
Hence, the variance is not small for increasing N, 

N m _* «, var[S Nxx (f)] = S2(f) (101) 

More clearly, if the ratio of mean to standard deviation is 
used as a kind of signal-to-noise ratio, i.e. 
E[S Nxx (f)] S(f) 

(var[S Nxx (f)])/ 2 s ( f ) 
it can be seen that the true signal spectrum is only as large 
as the noise or uncertainty in S^ (f) for increasing N. In 
addition, the variance of equation (101), which also is ap- 
proximately applicable for non-Gaussian sequences, indi- 
cates that calculations using different sets of N samples 
from the same x(n) process will yield vastly different values 
of S|\| xx (f) even when N becomes large. Unfortunately, since 
the variance of SN xx (f) does not decrease to zero as N ap- 
proaches infinity, the periodogram is thus an inconsistent 
estimate of the power spectral density and cannot be used 
for spectrum analysis in its present form. 

7.0 SPECTRAL ESTIMATION BY 
AVERAGING PERIODOGRAMS 

It was shown in the last section that the periodogram was 
not a consistent estimate of the power spectral density 



function. A technique introduced by Bartlett, however, al- 
lows the use of the periodogram and, in fact, produces a 
consistent spectral estimation by averaging periodograms. 
In short, Bartlett's approach reduces the variance of the 
estimates by averaging together several independent perio- 
dograms. If, for example Xi , X2, X3 Xl are uncorrelated 

random variables having an expected value E[x] and a vari- 
ance 0-2, then the arithmetic mean 

Xi + X 2 + X 3 + ... + Xi 

— = (103) 

has the expected value E[x] and a variance of 0-2/L. This 
fact suggests that a spectral estimator can have its variance 
reduced by a factor of L over the periodogram. The proce- 
dure requires the observed process, an N point data se- 
quence, to be split up into L nonoverlapping M point sec- 
tions and then averaging the periodograms of each individu- 
al section. 

To be more specific, dividing an N point data sequence x(n), 
< n < N - 1, into L segments of M samples each the 
segments X^(n) are formed. Thus, 



< n < M - 1 



(104) 



XmW = x(n + /M - M) 

1 < / < L 

where the superscript / specifies the segment or interval of 
data being observed, the subscript M represents the num- 
ber of data points or samples per segment and depending 
upon the choice of L and M, we have N > LM. For the 
computation of L periodograms 

M - 1 



SmW 






XM(n)e-i«nT 



2 1</ <l_ (105) 



n = 

If the autocorrelation function Fi^ (m) becomes negligible 
for m large relative to M, m > M, then it can be said that the 
periodograms of the separate sections are virtually indepen- 
dent of one another. The corresponding averaged periodo- 
gram estimator S M (f) computed from L individual periodo- 
grams of length M is thus defined 
L 



SmW 



l X S M<f> 



(106) 



/ = 1 
Since the L subsidiary estimates are identically distributed 
periodograms, the averaged spectral estimate will have the 
same mean or expected value as any of the subsidiary esti- 
mates so 

L 



E[Sm(0] =[ A EIS " (f)] 



E[S^(f)] 



(107) 



(108) 



From this, the expected value of the Bartlett estimate can 
be said to be the convolution of the true spectral density 
with the Fourier transform of the triangular window function 
corresponding to the M sample periodogram where M<N/L 
equations (98) or (99) we see that 



E[Sm(0] = E[S^,(f)] = 1 j* S(r,)A(f - r,)dr, 



where A(f) is the Fourier transformed triangular window 
function of equation (100). Though the averaged estimate is 
no different than before, its variance, however, is smaller. 
Recall that the variance of the average of L identical inde- 
pendent random variables is 1/L of the individual variances, 
equation (51). Thus, for L statistically independent periodo- 
grams, the variance of the averaged estimate is 



r[SM(f>] 



^var[S Nxx (f)] ^[S(f)]2 



(110) 



(109) 



So, again, the averaging of L periodograms results in ap- 
proximately a factor of L reduction in power spectral density 
estimation variance. Since the variance of equation (110) 
tends to zero as L approaches infinity and through equation 
(98) and (99) S^ff) is asymptotically unbiased, Sy(f) can be 
said to be a consistent estimate of the true spectrum. 
A few notes are next in order. First, the L fold variance 
reduction or (L)Vi signal-to-noise ratio improvement of equa- 
tion (102) is not precisely accurate since there is some de- 
pendence between the subsidiary periodograms. The adja- 
cent samples will correlated unless the process being ana- 
lyzed is white. 

However, as indicated in equation (110), such a depen- 
dence will be small when there are many sample intervals 
per periodogram so that the reduced variance is still a good 
approximation. Secondly, the bias of S^ff), equation (106), 
is greater than S M (f), equation (105), since the main lobe of 
the spectral window is larger for the former. For this situa- 
tion, then, the bias can be thought of as effecting spectral 
resolution. It is seen that increasing the number of periodo- 
grams for a fixed record length N decreases not only the 
variance but, the samples per periodograms M decrease 
also. This decreases the spectral resolution. Thus when us- 
ing the Bartlett procedure the actual choice of M and N will 
typically be selected from prior knowledge of a signal or 
data sequence under consideration. The tradeoff, however, 
will be between the spectral resolution of bias and the vari- 
ance of the estimate. 

8.0 WINDOWS 

Prior to looking at other techniques of spectrum estimation, 
we find that to this point the subject of spectral windows has 
been brought up several times during the course of our dis- 
cussion. No elaboration, however, has been spent explain- 
ing their spectral effects and meaning. It is thus appropriate 
at this juncture to diverge for a short while to develop a 
fundamental understanding of windows and their spectral 
implications prior to the discussion of Sections 9 and 10 (for 
an in depth study of windows see Windows, Harmonic Anal- 
ysis and the Discrete Fourier Transform; Frederic J. Harris; 
submitted to IEEE Proceedings, August 1976). 
In most applications it is desirable to taper a finite length 
data sequence at each end to enhance certain characteris- 
tics of the spectral estimate. The process of terminating a 
sequence after a finite number of terms can be thought of 
as multiplying an infinite length, i.e., impulse response se- 
quence by a finite width window function. In other words, the 
window function determines how much of the original im- 
pulse sequence can be observed through this window, 



10 



see Figures 4a, 4b, and 4c. This tapering by mulitiplying the 
sequence by a data window is thus analogous to multiplying 
the correlation function by a lag window. In addition, since 
multiplication in the time domain is equivalent to convolution 
in the frequency domain then it is also analogous to con- 
volving the Fourier transform of a finite-length-sequence 
with the Fourier transform of the window function, Figures 
4d, 4e, and 4f. Note also that the Fourier transform of the 
rectangular window function exhibits significant oscillations 
and poor high frequency convergence, Figure 4e. Thus, 
when convolving this spectrum with a desired amplitude 
function, poor convergence of the resulting amplitude re- 
sponse may occur. This calls for investigating the use of 
other possible window functions that minimize some of the 
difficulties encountered with rectangular function. 



(a) 

bit) 



(d) 



(b) 





(c) 

TL/H/8712-5 

FIGURE 4 

In order for the spectrum of a window function to have mini- 
mal effects upon the desired amplitude response, resulting 
from convolving two functions, it is necessary that the win- 
dow spectrum approximate an impulse function. This im- 
plies that as much of its energy as possible should be con- 
centrated at the center of the spectrum. Clearly, an ideal 
impulse spectrum is not feasible since this requires an infi- 
nitely long window. 

In general terms, the spectrum of a window function typical- 
ly consists of a main lobe, representing the center of the 
spectrum, and various side lobes, located on either side of 
the main lobe (see Figures 6 thru 9). It is desired that the 
window function satisfy two criteria; (1) that the main lobe 
should be as narrow as possible and (2) relative to the 



main lobe, the maximum side lobe level should be as small 
as possible. Unfortunately, however, both conditions cannot 
be simultaneously optimized so that, in practice, usable win- 
dow functions represent a suitable compromise between 
the two criteria. A window function in which minimization of 
the main lobe width is the primary objective, fields a finer 
frequency resolution but suffers from some oscillations, i.e., 
the spectrum passband and substantial ripple in the spec- 
trum stopband. Coversely, a window function in which mini- 
mization of the side lobe level is of primary concern tends to 
have a smoother amplitude response and very low ripple in 
the stopband but, yields a much poorer frequency resolu- 
tion. Examining Figure 5 assume a hypothetical impulse re- 
sponse, Figure 5a, whose spectrum is Figure 5b. Multiplying 
the impulse response by the rectangular window, Figure 4b, 
yields the windowed impulse response, Figure 5c, implying 
the convolution of the window spectrum, Figure 4e, with the 
impulse response spectrum, Figure 5b. The result of this 
convolution is seen in Figure 5d and is a distorted version of 
the ideal spectrum, Figure 5b, having passband oscillations 
and stopband ripple. Selecting another window, i.e., Figure 
9 with more desirable spectral characteristics, we see the 
appropriately modified windowed data, Figure 5e, results in 
a very good approximation of Figure 5b. 
This characteristically provides a smoother passband and 
lower stopband ripple level but sacrifices the sharpness of 
the roll-off rate inherent in the use of a rectangular window 
(compare Figures 5d and 5f ). Concluding this brief discus- 
sion, a few common window functions will next be consid- 
ered. 



Mi 



ID-, 



(a) 

i„<t)-b(l 



(b) 



<c> 

a„(tl-t(tl 




(8) 



(I) 



TL/H/8712- 

FIGURE 5. (a)(b) Unmodified Data Sequence 

(c)(d) Rectangular Windowed Data Sequence 

(e)(f) Hamming Windowed Data Sequence 
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Rectangular window: Figure 6 

N - 1 



w(n) = 1 



|n|<- 



2 

= otherwise 

Bartlett or triangular window: Figure 7 



w(n) = 1 







2jn] 
N 



n <■ 



N - 1 



2 
otherwise 



Hann window: Figure 8 
w(n) = 0.5 + 0.5 cos 



27rn 



|n|< 



N - 1 



2 
otherwise 



Hamming window: Figure 9 



2im\ . . N - 1 
w(n) = 0.54 + 0.46 cos I I |n| < 

= otherwise 



Again the reference previously cited should provide a more 
detailed window selection. Nevertheless, the final window 

(111) choice will be a compromise between spectral resolution 
and passband (stopband) distortion. 

9.0 SPECTRAL ESTIMATION BY USING WINDOWS TO 
SMOOTH A SINGLE PERIODOGRAM 

(112) It was seen in a previous section that the variance of a 
power spectral density estimate based on an N point data 
sequence could be reduced by chopping the data into short- 
er segments and then averaging the periodograms of the 
individual segments. This reduced variance was acquired at 

(1 1 3) the expense of increased bias and decreased spectral reso- 
lution. We will cover in this section an alternate way of com- 
puting a reduced variance estimate using a smoothing oper- 
ation on the single periodogram obtained from the entire N 
point data sequence. In effect, the periodogram is 

(114) smoothed by convolving it with an appropriate spectral win- 
dow. Hence if Sxx(f) denotes the smooth periodogram then, 




SwxxO 



= JT 2 1/2 S N XX W W (f- 



T,)dr) = S Nxx (T,)*W(7,) (115) 




ivm 



^AA/y/^- Ajwv- 



FIGURE 6. Rectangular Window 







FIGURE 8. Hann Window 



TL/H/8712-8 





TL/H/8712- 

FIGURE 7. Bartlett or Triangular Window 




FIGURE 9. Hamming Window 
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where W(f — t;) is the spectral window and * denotes con- 
volution. Since the periodogram is equivalent to the Fourier 
transform of the autocorrelation function Rn xx (I<) then, using 
the frequency convolution theorem 

F(x(t)y(t)} = X(f) « Y(r, - f) (116) 

where F ( ) denotes a Fourier transform, Sxx(f) is the Fouri- 
er transform of the product of RN xx (k) and the inverse Fouri- 
er transform of W(f). Therefore for a finite duration window 
sequence of length 2K - 1, 
K - 1 



where 



S W xx (f)= V RN xx (k)w(k) £ -iwkT 
(K-1) 

y w(f> £ i»>kT a f 

-V4 



w(k) 



(117) 



(118) 



References (10)(16)(21) proceed further with a develop- 
ment to show that the smoothed single windowed periodo- 
gram is a consistent estimate of the power spectral density 
function. The highlights of the development, however, show 
that a smoothed or windowed spectral estimate, Sw xx (f). 
can be made to have a smaller variance than that of the 
straight periodogram, S|M xx (f), by (S the variance ratio rela- 
tionship 

M - 1 
_ var [S Wxx (f)] _ 1 

~ var [S N (f)] " N 

m = -(M - 1) 



I 



w 2 (m) 



(119) 



1TV4 , 
Nj -V4 



W2(f) df 



where N is the record length and 2M - 1 is the total window 
width. Note that the energy of the window function can be 
found in equation (119) as 



M 



I 



w 2 (m) 



m = -(M - 1) 



J -V, 



W2(f)df ( 12 0) 



Continuing from equation (119), it is seen that a satisfactory 
estimate of the periodogram requires the variance of 
s w xx (f) t0 be smal1 compared to Sn xx so that 

< 1 (121) 

Therefore, it is the adjusting of the length and shape of the 
window that allows the variance of Sw xx (f) to be reduced 
over the periodogram. 

Smoothing is like a low pass filtering effect, and so, causes 
a reduction in frequency resolution. Further, the width of the 
window main lobe, defined as the symmetric interval be- 
tween the first positive and negative frequencies at which 
W(f) = 0, in effect determines the bandwidth of the 
smoothed spectrum. Examining Table I for the following de- 
fined windows; 
Rectangular window 
w(m) = 1 |m| < M - 1 (122) 

= otherwise 



Bartlett or triangular window 






I ml . . 
w(m) = 1 - i—j- |m| < M - 1 




(123) 


= otherwise 






Hann window 






. , ( 7rm \ 


|m| < M - 1 


(124) 


" v "' "'" ' "'■'""*' Vm- iJ 


= 


otherwise 




Hamming window 






I irm \ 


|m| < M - 1 


(125) 


" v "' """ ' ""™ w "Vm - 1/ 


= 


otherwise 





We see once again, as in the averaging technique of spec- 
tral estimation (Section 7), the smoothed periodogram tech- 
nique of this discussion also makes a trade-off between var- 
iance and resolution for a fixed N. A small variance requires 
a small M while high resolution requires a large M. 
TABLE I 



Window 
Function 


Width of 

Main Lobe" 

(approximate) 


Variance Ratio* 
(approximate) 


Rectangular 


277 

~M~ 


2M 

"n~ 


Bartlett 


Ait 


2M 
3N 


Hann 


3tt 
M 


2M 


' (0.5)2 + (0.5)2" 
2 




N 


Hamming 


3tt 


2M 


(0.54)2 + (0.46)2 
2 




N 



10.0 SPECTRAL ESTIMATION BY AVERAGING 
MODIFIED PERIODOGRAMS 

Welch [Ref. (36)(37)] suggests a method for measuring 
power spectra which is effectively a modified form of Bart- 
lett's method covered in Section 7. This method, however, 
applies the window w(n) directly to the data segments be- 
fore computing their individual periodograms. If the data se- 
quence is sectioned into 

L^ 
M 



segments of M samples each as defined in equation (104), 
the L modified or windowed periodogram can be defined as 

M - 1 



Im(t) 



1 
UM 



1 



XM<n)w(n) e-io>nT 



1 < / < L (126) 
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where U, the energy in the window is 
M = 1 



u = iX w2(n) 



(127) 



Note the similarity between equations (126) and (105) and 
that equation (126) is equal to equation (105) modified by 
functions of the data window w(n). 
The spectral estimate I M is defined as 

L 



/(f) 



(128) 



Im(0 = I V Im 

/ = 1 
ilue is given by 

z - [I M (f)l = I " S Nxx (ij) W (f - r,) di) = S Nxx (ij) ' W(r,) 29) 

J _1 /2 



/ = 1 

and its expected value is given by 



where 



W(f) 



UM 



/ w(n; 



i) t-junT 



(130) 



The normalizing factor U is required so that the spectral 
estimate lm(f), of the modified periodogram, lm(f), will be 
asymptotically unbiased [Ref. (34)]. If the intervals of x(n) 
were to be nonoverlapping, then Welch [Ref. (37)] indicates 
that 



vartlm(f)l s r var [ s Nxx( f >l 



[S(f)]2 



(131) 



which is similar to that of equation (1 1 0). Also considered is 
a case where the data segments overlap. As the overlap 
increases the correlation between the individual periodo- 
grams also increase. We see further that the number of M 
point data segments that can be formed increases. These 
two effects counteract each other when considering the var- 
iance l m (f). With a good data window, however, the in- 
creased number of segments has the stronger effect until 
the overlap becomes too large. Welch has suggested that a 
50 percent overlap is a reasonable choice for reducing the 
variance when N if fixed and cannot be made arbitrarily 
large. Thus, along with windowing the data segments prior 
to computing their periodograms, achieves a variance re- 
duction over the Bartlett technique and at the same time 
smooths the spectrum at the cost of reducing its resolution. 
This trade-off between variance and spectral resolution or 
bias is an inherent property of spectrum estimators. 

11.0 PROCEDURES FOR POWER SPECTRAL 
DENSITY ESTIMATES 

Smoothed single periodograms [Ref. (21)(27)(32)] 

A. 1 . Determine the sample autocorrelation Rn xx (I<) of the 
data sequence x(n) 

2. Multiply RN xx (k) by an appropriate window function 
w(n) 

3. Compute the Fourier transform of the product 

RNxx(k) w(n) <— >• S Wxx (f) (71) 



B. 1 . Compute the Fourier transform of the data sequence 
x(n) 
x(n) <— >• X(f) 
2. Multiply X(f) by its conjugate to obtain the power spec- 
tral density S N (f) 



S N xx <f) = ^|X(f)|2 



(74) 



3. Convolve S^ (f) with an appropriate window function 
W(f) 
Sw xx (f) = S Nxx (f) * W(f) (115) 

C. 1. Compute the Fourier transform of the data sequence 
x(n) 
x(n) <— >• X(f) 

2. Multiply X(f) by its conjugate to obtain the power spec- 
tral density S Nxx (f) 

S Nxx (f) = ^|X(f)|2 (74) 

3. Inverse Fourier transform S[\| xx (f) to get RN xx ( k ) 

4. Multiply Rn xx (I<) by an appropriate window function 
w(n) 

5. Compute the Fourier transform of the product to obtain 
s Wxx (f) 

Sw xx (0 *-* RN xx (k) • w(n) (117) 

Averaging periodograms [Ref. (32)(37)(38)] 
A. 1 . Divide the data sequence x(n) into L < N/M segments, 
Xy(n) 

2. Multiply a segment by an appropriate window 

3. Take the Fourier transform of the product 

4. Multiply procedure 3 by its conjugate to obtain the 
spectral density of the segment 

5. Repeat procedures 2 through 4 for each segment so 
that the average of these periodogram estimates pro- 
duce the power spectral density estimate, equation 
(128) 

12.0 RESOLUTION 

In analog bandpass filters, resolution is defined by the filter 
bandwidth, Afew. measured at the passband half power 
points. Similarly, in the measurement of power spectral den- 
sity functions it is important to be aware of the resolution 
capabilities of the sampled data system. For such a system 
resolution is defined as 

Mb\N = t^ (132) 



and for; 
Correlation resolution 

AfBW = 



NT 



T max = m "T, where m 
is the maximum value 
allowed to produce 
T max , the maximum lag 
term in the correlation 
computation 



(133) 
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Fourier transform (FFT) resolution 



Af R 



1 m 1 where P is the record length, 
p. — Nj — |_j N, the number of data points 
and m, the samples within 
each L segment, 
N 

L = m- < 134 > 

M 

Note that the above Afew's can be substantially poorer de- 
pending upon the choice of the data window. A loss in de- 
grees of freedom (Section 13) and statistical accuracy oc- 
curs when a data sequence is windowed. That is, data at 
each end of a record length are given less weight than the 
data at the middle for anything other than the rectangular 
window. This loss in degrees of freedom shows up as a loss 
in resolution because the main lobe of the spectral window 
is widened in the frequency domain. 
Finally, the limits of a sampled data system can be de- 
scribed in terms of the maximum frequency range and mini- 
mum frequency resolution. The maximum frequency range 
is the Nyquist or folding frequency, f c , 

fc = J = ^r (135) 

where f s is the sampling frequency and T s the sampling 
period. And, secondly, the resolution limit can be described 
in terms of a (Af B w) NT product where 



Afew ^ 



1 
NT 



(136) 



(Af BW )NT>1 (137) 

13.0 CHI-SQUARE DISTRIBUTIONS 

Statistical error is the uncertainty in power spectral density 
measurements due to the amount of data gathered, the pro- 
babilistic nature of the data and the method used in deriving 
the desired parameter. Under reasonable conditions the 
spectral density estimates approximately follow a chi- 
square, x n . distribution. x„ is defined as the sum of the 
squares of x n > 1 s n < N, independent Gaussian variables 
each with a zero mean and unit variance such that 

N 



2 _ \ ^ 2 



(138) 



The number n is called the degrees of freedom and the Xn 
probability density function is 



f( Xn) 



2 n/ 2r (H) 



"-2-1 — Xn 



(Xn) 



(139) 



where rl j I is the statistical gamma function (Ref. (14)]. 



Figure 10 shows the probability density function for several 
n values and it is important to note that as n becomes large 
the chi-square distribution approximates a Gaussian distri- 
bution. We use this x 2 distribution in our analysis to discuss 
the variability of power spectral densities. If x n has a zero 
mean and N samples of it are used to compute the power 
spectral density estimate S(f) then, the probability that the 
true spectral density, S(f), lies between the limits 

A<S(f)<B (140) 

is 

P = (1 - a) = probability (141) 

flv 2 „l 




TL/H/8712-10 



FIGURE 10 

The lower A and upper B limits are defined as 
nSffl 



and 



B 



nS(f) 



respectively, x 2 . is defined by 

r oo 

x 2 n . a = [v so that f(x 2 n )dx 2 n 

J V 



(142) 



(143) 



(144) 



see Figure 11 and the interval A to B is referred to as a 
confidence interval. From Otnes and Enrochson [Ref. (35) 
pg. 217] the degrees of freedom can be described as 

n = 2(Af BW ) NT = 2(Af BW ) P|_ (145) 

f<x 2 n> 
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FIGURE 11 
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and that for large n i.e., n > 30 the Xn distribution ap- 
proaches a Gaussian distribution so that the standard devia- 
tion or standard error, £ , can be given by 

£ ° = TOT (146) 

The degrees of freedom and associated standard error for 

the correlation and Fourier transform are as follows: 

2N Im 

correlation: n = — £ = ,/— (147) 



FFT: n 



m 
2M 



(148) 
(149) 



where M is the number of |X(f)| 2 values 

M = NT (Af BW )desired 
and m is the maximum lag value. 

An example will perhaps clarify the usage of this informa- 
tion. 

Choosing T = 100 ms, N = 8000 samples and n = 20 
degrees of freedom then 

1 

It ' 

n = 2(NT) (Af BW ) 
so 



ffi 



5 Hz 



Af B w : 



0.0125 Hz 



2NT 

If it is so desired to have a 95% confidence level of the 
spectral density estimate then 
P = (1 - a) 
0.95 = 1 - a 

a = 1 - 0.95 = 0.05 
the limits 

nS(f) 20S(f) 







2 


2 






*n;1 - a/2 


^20; 0.975 




A 


nS(f) 

2 


20S(f) 

2 






•^n;a/2 


^20; 0.025 


yield from Table 


II 










2 
^ 20; 0.975 


9.59 






2 

^20; 0.025 


34.17 


so that 












20S(f) 

< S(f) 


20S(f) 



34.17 9.59 

0.5853 S(f) < S(f) < 2.08 S(f) 



fn 



500 Hz 



0.158 



There is thus a 95% confidence level that the true spectral 
density function S(f) lies within the interval 0.5853 S(f) < 
S(f) < 2.08 S(f). 

As a second example using equation (148) let T = 1 ms, 
N = 4000 and it is desired to have (Afew) desired = 10 Hz. 
Then, 
NT = 4 
1 
2T ' 

£ ° = Vm = VnT (Af BW ) desired 

N = 2M = 2NT (Af BW ) des ired = 80 
If it is again desired to have a 95% confidence level of the 
spectral density estimate then 

a = 1 - p = 0.05 

^80; 0.975 = 575 
X 2 8 0;0.025= 106 - 63 

and we thus have a 95% confidence level that the true 
spectral density S(f) lies within the limits 

0.75 S(f) < S(f) < 1 .39 S(f) 
It is important to note that the above examples assume 
Gaussian and white data. In practical situations the data is 
typically colored or correlated and effectively results in re- 
ducing number of degrees of freedom. It is best, then, to use 
the white noise confidence levels as guidelines when plan- 
ning power spectral density estimates. 

14.0 CONCLUSION 

This article attempted to introduce to the reader a conceptu- 
al overview of power spectral estimation. In doing so a wide 
variety of subjects were covered and it is hoped that this 
approach left the reader with a sufficient base of "tools" to 
indulge in the mounds of technical literature available on the 
subject. 
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TABLE II. Percentage Points of the Chi-Square Distribution 



n 


a 


0.995 


0.990 


0.975 


0.950 


0.050 


0.025 


0.010 


0.005 


1 


0.000039 


0.00016 


0.00098 


0.0039 


3.84 


5.02 


6.63 


7.88 


2 


0.0100 


0.0201 


0.0506 


0.1030 


5.99 


7.38 


9.21 


10.60 


3 


0.0717 


0.115 


0.216 


0.352 


7.81 


9.35 


11.34 


12.84 


4 


0.207 


0.297 


0.484 


0.711 


9.49 


11.14 


13.28 


14.86 


5 


0.412 


0.554 


0.831 


1.150 


11.07 


12.83 


15.09 


16.75 


6 


0.68 


0.87 


1.24 


1.64 


12.59 


14.45 


16.81 


18.55 


7 


0.99 


1.24 


1.69 


2.17 


14.07 


16.01 


18.48 


20.28 


8 


1.34 


1.65 


2.18 


2.73 


15.51 


17.53 


20.09 


21.96 


9 


1.73 


2.09 


2.70 


3.33 


16.92 


19.02 


21.67 


23.59 


10 


2.16 


2.56 


3.25 


3.94 


18.31 


20.48 


23.21 


25.19 


11 


2.60 


3.05 


3.82 


4.57 


19.68 


21.92 


24.72 


26.76 


12 


3.07 


3.57 


4.40 


5.23 


21.03 


23.34 


26.22 


28.30 


13 


3.57 


4.11 


5.01 


5.89 


22.36 


24.74 


27.69 


29.82 


14 


4.07 


4.66 


5.63 


6.57 


23.68 


26.12 


29.14 


31.32 


15 


4.60 


5.23 


6.26 


7.26 


25.00 


27.49 


30.58 


32.80 


16 


5.14 


5.81 


6.91 


7.96 


26.30 


28.85 


32.00 


34.27 


17 


5.70 


6.41 


7.56 


8.67 


27.59 


30.19 


33.41 


35.72 


18 


6.26 


7.01 


8.23 


9.39 


28.87 


31.53 


34.81 


37.16 


19 


6.84 


7.63 


8.91 


10.12 


30.14 


32.85 


36.19 


38.58 


20 


7.43 


8.26 


9.59 


10.85 


31.41 


34.17 


37.57 


40.00 


21 


8.03 


8.90 


10.28 


11.59 


32.67 


35.48 


38.93 


41.40 


22 


8.64 


9.54 


10.98 


12.34 


33.92 


36.78 


40.29 


42.80 


23 


9.26 


10.20 


11.69 


13.09 


35.17 


38.08 


41.64 


44.18 


24 


9.89 


10.86 


12.40 


13.85 


36.42 


39.36 


42.98 


45.56 


25 


10.52 


11.52 


13.12 


14.61 


37.65 


40.65 


44.31 


46.93 


26 


11.16 


12.20 


13.84 


15.38 


38.89 


41.92 


45.64 


48.29 


27 


11.81 


12.88 


14.57 


16.15 


40.11 


43.19 


46.96 


49.64 


28 


12.46 


13.56 


15.31 


16.93 


41.34 


44.46 


48.28 


50.99 


29 


13.12 


14.26 


16.05 


17.71 


42.56 


45.72 


49.59 


52.34 


30 


13.79 


14.95 


16.79 


18.49 


43.77 


46.98 


50.89 


53.67 


40 


20.71 


22.16 


24.43 


26.51 


55.76 


59.34 


63.69 


66.77 


50 


27.99 


29.71 


32.36 


34.76 


67.50 


71.42 


76.15 


79.49 


60 


35.53 


37.48 


40.48 


43.19 


79.08 


83.80 


88.38 


91.95 


70 


43.28 


45.44 


48.76 


51.74 


90.53 


95.02 


100.43 


104.22 


80 


51.17 


53.54 


57.15 


60.39 


101.88 


106.63 


112.33 


116.32 


90 


59.20 


61.75 


65.65 


69.13 


113.14 


118.14 


124.12 


128.30 


100 


67.33 


70.06 


74.22 


77.93 


124.34 


129.56 


135.81 


140.17 
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APPENDIX A 

A.0 CONCEPTS OF PROBABILITY, RANDOM 
VARIABLES AND STOCHASTIC PROCESSES 

In many physical phenomena the outcome of an experiment 
may result in fluctuations that are random and cannot be 
precisely predicted. It is impossible, for example, to deter- 
mine whether a coin tossed into the air will land with its 
head side or tail side up. Performing the same experiment 
over a long period of time would yield data sufficient to indi- 
cate that on the average it is equally likely that a head or tail 
will turn up. Studying this average behavior of events allows 
one to determine the frequency of occurrence of the out- 
come (i.e., heads or tails) and is defined as the notion of 
probability. 

Associated with the concept of probability are probability 
density functions and cumulative distribution functions 
which find their use in determining the outcome of a large 
number of events. A result of analyzing and studying these 
functions may indicate regularities enabling certain laws to 
be determined relating to the experiment and its outcomes; 
this is essentially known as statistics. 

A.1 DEFINITIONS OF PROBABILITY 

If nA is the number of times that an event A occurs in N 
performances of an experiment, the frequency of occur- 
rence of event A is thus the ratio n^/N. Formally, the proba- 
bility, P(A), of event A occurring is defined as 

["a! 



P(A) 



lim 
N- 



(A.1-1) 



Where it is seen that the ratio n/^/U (or fraction of times that 
an event occurs) asymptotically approaches some mean 
value (or will show little deviation from the exact probability) 
as the number of experiments performed, N, increases 
(more data). 
Assigning a number, 

N ' 
to an event is a measure of how likely or probable the event. 

Since n^ and N are both positive and real numbers and < 
nA ^ N; it follows that the probability of a given event can- 
not be less than zero or greater than unity. Furthermore, if 
the occurrence of any one event excludes the occurrence of 
any others (i.e., a head excludes the occurrence of a tail in a 
coin toss experiment), the possible events are said to be 
mutually exclusive. If a complete set of possible events A-| 
to A n are included then 



"A-, 



"A 2 n A 3 



N 



1 (A.1 -2) 



Similarly, an event that is absolutely certain to occur has a 
probability of one and an impossible event has a probability 
of zero. 
In summary: 

1. < P(A) < 1 

2. P(A!) + P(A 2 ) + P(A 3 ) + . . . + P(A n ) = 1, for an 
entire set of events that are mutually exclusive 

3. P(A) = represents an impossible event 

4. P(A) = 1 represents an absolutely certain event 
A.2 JOINT PROBABILTY 

If more than one event at a time occurs (i.e., events A and B 
are not mutually excusive) the frequency of occurrence of 
the two or more events at the same time is called the joint 
probability, P(AB). If nAB is the number of times that event A 
and B occur together in N performances of an experiment, 
then 



P(A,B) 



lim 
N- 



"AB 
N 



(A.2-1) 



A.3 CONDITIONAL PROBABILITY 

The probability of event B occurring given that another 
event A has already occurred is called conditional probabili- 
ty. The dependence of the second, B, of the two events on 
the first, A, will be designated by the symbol P(B)|A) or 



P(B|A) 



"AB 
nA 



(A.3-1) 



where nAB is the number of joint occurrences of A and B 
and Na represents the number of occurrences of A with or 
without B. By dividing both the numerator and denominator 
of equation (A.3-1) by N, conditional probability P(B|A) can 
be related to joint probability, equation (A.2-1 ), and the prob- 
ability of a single event, equation (A.1-1) 
1 



P(B|A) 




P(A,B) 
P(A) 



Analogously 



P(A|B) - 



P(A,B) 
P(A) 



and combining equations (A.6) and (A. 7) 

P(A|B) P(B) = P(A, B) = P(B|A) P(A) 
results in Bayes' theorem 

P(A) P(B|A) 



P(A|B) 



P(B) 



(A.3-2) 



(A.3-3) 



(A.3-4) 



(A.3-5) 



P(A!) + P(A 2 ) + P(A 3 ) 



P(A n ) = 1 (A.1 -3) 
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Using Bayes' theorem, it is realized that if P(A) and P(B) are 
statistically independent events, implying that the probability 
of event A does not depend upon whether or not event B 
has occurred, then P(A|B) = P(A), P(B|A) = P(B) and 
hence the joint probability of events A and B is the product 
of their individual probabilities or 

P(A,B) = P(A) P(B) (A.3-6) 

More precisely, two random events are statistically indepen- 
dent only if equation (A.3-6) is true. 

A.4 PROBABILITY DENSITY FUNCTIONS 

A formula, table, histogram, or graphical representation of 
the probability or possible frequency of occurrence of an 
event associated with variable X, is defined as fx(x), the 
probability density function (pdf) or probability distribution 
function. As an example, a function corresponding to height 
histograms of men might have the probability distribution 
function depicted in Figure A.4. 1 . 

f x (x) 




D 4' 5' 6' 7' 

TL/H/8712-12 

FIGURE A.4.1 

The probability element, fx(x) dx, describes the probability 
of the event that the random variable X lies within a range of 
possible values between 
Ax\ 



- and 
2 I 



(•♦?) 



i.e., the area between the two points 5'5" and 5'7" shown 
in Figure A.4.2 represents the probability that a man's height 
will be found in that range. More clearly, 

(A.4-1) 

Ax 
x + — 
Ax\ I Ax\ 1(2 

~2 



Prob 



H4 + f)W Ax fxMdx 

x 

2 

f 57" 

fxM dx 

J 5'5" 



Prob[5'5" < X < 5'7"] 




PROBABILITY OF OCCURRENCE 
IN THE INTERVAL x, TO x 2 
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FIGURE A.4.2 

Continuing, since the total of all probabilities of the random 

variable X must equal unity and fx(x) dx is the probability 

that X lies within a specified interval 

Ax\ I Ax 

x- T jand^x- y 

then, 



f x (x) dx = 1 



(A.4-2) 



It is important to point out that the density function fx(x) is in 
fact a mathematical description of a curve and is not a prob- 
ability; it is therefore not restricted to values less than unity 
but can have any non-negative value. Note however, that in 
practical application, the integral is normalized such that the 
entire area under the probability density curve equates to 
unity. 

To summarize, a few properties of fx(x) are listed below. 
1 . fx(x) > for all values of x or - °° <x< °° 



2 'f 



fx(x) dx = 1 



3. Prob 



Ax\ 



2 / sx n x+ f) 



f x (x) dx 



J _ Ax 
2 

A.5 CUMULATIVE DISTRIBUTION FUNCTION 

If the entire set of probabilities for a random variable event 
X are known, then since the probability element, fx(x) dx, 
describes the probability that event X will occur, the accu- 
mulation of these probabilities from x= — °°tox=°ois 
unity or an absolutely certain event. Hence, 



FxM 



f X M dx = 1 



(A.5-1) 
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where Fx(x) is defined as the cumulative distribution func- 
tion (cdf) or distribution function and f x (x) is the pdf of ran- 
dom variable X. Illustratively, Figures A.5. 1a and A.5. 1b 
show the probability density function and cumulative distri- 
bution function respectively. 



«xW 




b, 

(a) Probability Density Function 



TL/H/8712-14 




a] b| 

TL/H/8712-15 

(b) Cumulative Distribution Function 
FIGURE A.5.1 

In many texts the notation used in describing the cdf is 

F x (x) = Prob[X < x] (A.5-2) 

and is defined to be the probability of the event that the 
observed random variable X is less than or equal to the 
allowed or conditional value x. This implies 

F x (x) = Prob[X < x] = ' 
It can be further noted that 



Problx^x^ 



, f* 2 
x<x 2 ]= f; 

J X1 



f X (x) dx (A 5 . 3) 

fx(x)dx=F x (x 2 )-F x (x 1 ) 



(A.5-4) 

and that from equation (A.5-1) the pdf can be related to the 
cdf by the derivative 

. d[Fx(x)l 



Re-examining Figure A.5. 1 does indeed show that the pdf, 
fx(x), is a plot of the differential change in slope of the cdf, 
F X (x). 

Fx(x) and a summary of its properties. 
1.0<F x (x)<1 -oo< x <co (Since F x =Prob [X<x] 

is a probability) 

2. F x (-°°) = 0F x (+») = 1 

3. Fx(x) the probability of occurrence increases as x in- 
creases 

4. F x (x) = X f x (x) dx 

5. Prob ( X1 < x < x 2 ) = F x (x 2 ) - F X ( X1 ) 

A.6 MEAN VALUES, VARIANCES AND 
STANDARD DEVIATION 

The procedure of determining the average weight of a group 
of objects by summing their individual weights and dividing 
by the total number of objects gives the average value of x. 
Mathematically the discrete sample mean can be described 



(A.6-1) 



tm 
I xf x (x) dx (A.6-2) 

J -oo 



for the continuous case that mean value of the random vari- 
able X is defined as 

x = E[X] 



where E[X] is read "the expected value of X". 
Other names for the same mean value x or the expected 
value E[X] are average value and statistical average. 
It is seen from equation (A.6-2) that E[X] essentially repre- 
sents the sum of all possible values of x with each value 
being weighted by a corresponding value of the probability 
density function of f x (x). 

Extending this definition to any function of X for example 
h(x), equation (A.6-2) becomes 



E[h(x)] 



h(x)f x (x)dx 



(A.6-3) 



An example at this point may help to clarify matters. As- 
sume a uniformly dense random variable of density 1/4 be- 
tween the values 2 and 6, see Figure A.6.1. The use of 
equation (A.6-2) yields the expected value 



E[X] 



j:» 



%dx 



x2|6 
8 | 2 



'xM 



i MEAN VALUE 

' OR CENTER OF GRAVITY 
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FIGURE A.6.1 



fx(x) 



dx 



(A.5-5) 
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which can also be interpreted as the first moment or center 
of gravity of the density function fx(x). The above operation 
is analogous to the techniques in electrical engineering 
where the DC component of a time function is found by first 
integrating and then dividing the resultant area by the inter- 
val over which the integration was performed. 
Generally speaking, the time averages of random variable 
functions of time are extremely important since essentially 
no statistical meaning can be drawn from a single random 
variable (defined as the value of a time function at a given 
single instant of time). Thus, the operation of finding the 
mean value by integrating over a specified range of possible 
values that a random variable may assume is referred to as 
ensemble averaging. 

In the above example, x was described as_the first moment 
m-| or DC value. The mean-square value x 2 or E[X 2 ] is the 
second moment m 2 or the total average power, AC plus DC 
and in general the nth moment can be written 



x2 



(A.6-7d) 



E[Xn] 



[h(x)] 2 f x (x)dx ( A - 6 " 4 ) 



Note that the first moment squared m^ x 2 or E[X] 2 is equiv- 
alent to the DC power through a Ifi resistor and is not the 
same as the second moment rri2, x 2 or E[X] 2 which, again, 
implies the total average power. 

A discussion of central moments is next in store and is sim- 
ply defined as the moments of the difference (deviation) 
between a random variable and its mean value. Letting 
[h(x)] n = (X - x) n , mathematically 



(X - x)" = E[(X - x)n] 



(X 



x)n f x (x) dx 
J ~°° (A.6-5) 

For n = 1, the first central moment equals zero i.e., the AC 
voltage (current) minus the mean, average or DC voltage 
(current) equals zero. This essentially yields little informa- 
tion. The second central moment, however, is so important 
that it has been named the variance and is symbolized by 
cr 2 . Hence, 



o- 2 = E[(X - x) 2 ] 



(X-x) 2 f x (x)dx 



(A.6-6) 



Note that because of the squared term, values of X to either 
side of the mean x are equally significant in measuring varia- 
tions or deviations away from x i.e., if x = 10, X^ = 12 and 
X 2 = 8 then (12 - 10) 2 = 4 and (8 - 10) 2 = 4 respective- 
ly. The variance therefore is the measure of the variability of 
[h(x)] 2 about its mean value x or the expected square devia- 
tion of X from its mean value. Since, 



(A.6-7a) 
(A.6-7b) 
(A.6-7C) 



r 2 = E[(X - x) 2 ] = E X 2 - 2xX + x, 
= E[X 2 ] - 2x E[X] + Xi 
= x 2 - 2xx + x? 



or 



m 2 - mt (A.6-7e) 

The analogy can be made that variance is essentially the 
average AC power of a function h(x), hence, the total aver- 
age power, second moment m 2 , minus the DC power, first 
moment squared m,. The positive square root of the vari- 
ance. 

W? = o- 
is defined as the standard deviation. The mean indicates 
where a density is centered but gives no information about 
how spread out it is. This spread is measured by the stan- 
dard deviation <r and is a measure of the spread of the 
density function about x, i.e., the smaller <r the closer the 
values of X to the mean. In relation to electrical engineering 
the standard deviation is equal to the root-mean-square 
(rms) value of an AC voltage (current) in circuit. 

A summary of the concepts covered in this section are listed 
in Table A.6.1. 

A.7 FUNCTIONS OF TWO JOINTLY DISTRIBUTED 
RANDOM VARIABLES 

The study of jointly distributed random variables can per- 
haps be clarified by considering the response of linear sys- 
tems to random inputs. Relating the output of a system to its 
input is an example of analyzing two random variables from 
different random processes. If on the other hand an attempt 
is made to relate present or future values to its past values, 
this, in effect, is the study of random variables coming from 
the same process at different instances of time. In either 
case, specifying the relationship between the two random 
variables is established by initially developing a probability 
model for the joint occurrence of two random events. The 
following sections will be concerned with the development 
of these models. 

A.8 JOINT CUMULATIVE DISTRIBUTION FUNCTION 

The joint cumulative distribution function (cdf) is similar to 
the cdf of Section A. 5 except now two random variables are 
considered. 

FxY(x,y) = Prob [X < x, Y < y] (A.8-1) 

defines the joint cdf, Fxytx.y), of random variables X and Y. 
Equation (A.8-1) states that F X y(x, y) is the probability asso- 
ciated with the joint occurrence of the event that X is less 
than or equal to an allowed or conditional value x and the 
event that Y is less than or equal to an allowed or condition- 
al value y. 
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TABLE A.6-1 



Symbol 


Name 


Physical Interpretation 


X, E[X],m i: xf x (x)dx 


Expected Value, 
Mean Value, 
Statistical Average 
Value 


• Finding the mean value of a random voltage (current) is 
equivalent to finding its DC component. 

• First moment; e.g., the first moment of a group of masses is just 
the average location of the masses or their center of gravity. 

• The range of the most probable values of x. 


E[X]2,X2, mi 




• DC power 


X2, E[X2], m 2 : x2f x (x) dx 

J -oo 


Mean Square Value 


• Interpreted as being equal to the time average of the square of a 
random voltage (current). In such cases the mean-square value 
is proportional to the total average power (AC plus DC) through a 
1 Q. resistor and its square root is equal to the rms or effective 
value of the random voltage (current). 

• Second moment; e.g., the moment of inertia of a mass or the 
turning moment of torque about the origin. 

• The mean-square value represents the spread of the curve about 
x = 


var[],o-2, (X-x)2, E[(X-x)2], 
) 

m2;J ° oo (x-S)2fx(x)dx 


Variance 


• Related to the average power (in a 1 f! resistor) of the AC 
components of a voltage (current) in power units. The square 
root of the variances is the rms voltage (current) again not 
reflecting the DC component. 

• Second movement; for example the moment of inertia of a mass 
or the turning moment of torque about the value x. 

• Represents the spread of the curve about the value x. 


vO" 2 ,cr 


Standard Deviation 


• Effective rms AC voltage (current) in a circuit. 

• A measure of the spread of a distribution corresponding to the 
amount of uncertainty or error in a physical measurement or 
experiment. 

• Standard measure of deviation of X from its mean value x. 



(X) 2 is a result of smoothing the data and then squaring it and (X 2 ) results from squaring the data and then smoothing it. 
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A few properties of the joint cumulative distribution function 
are listed below. 

1 . < F X Y(x,y) <1 - oo < x < oo 

— oo < y < oo 

(since F X y(x,y) = Prob[X < x, Y < y] is a probability) 

2. F XY (-°°,y) = 
Fxy(x.-°°) = 

Fxy( _ oo j -oo) = 

3- Fxy(+°°,+ °°) = 

4. FxY<x,y) The probability of occurrence 

increases as either x or y, or 

both increase 

A.9 JOINT PROBABILITY DENSITY FUNCTION 

Similar to the single random variable probability density 
function (pdf) of sections A.4 and A.5, the joint probability 
density function fxY(x, y) is defined as the partial derivative 
of the joint cumulative distribution function Fxy(x, y). More 
clearly, 

d2 
fXY(x.y) = ^— F XY (x,y) (A.9-1) 

Recall that the pdf is a density function and must be inte- 
grated to find a probability. As an example, to find the prob- 
ability that (X, Y) is within a rectangle of dimension (x 1 < X 
< x 2 ) and (y-i < Y < y 2 ), the pdf of the joint or two-dimen- 
sional random variable must be integrated over both ranges 
as follows, 



Prob[x 1 < X < x 2 , yi < Y < y 2 ] 

'Y2 



(A.9-2) 



J xi J 



x, J y. 



fxY(x.y) dydx = 1 



It is noted that the double integral of the joint pdf is in fact 
the cumulative distribution function 



Fxy(x, y) 



n: 



fxY(x, y) dxdy = 1 (A.9-3) 



analogous to Section A.5. Again fxytx, y) dxdy represents 
the probability that X and Y will jointly be found in the ranges 

dx dy 

x + — and y + — , 
2 2 

respectively, where the joint density function fxy(x, y) has 

been normalized so that the volume under the curve is unity. 

A few properties of the joint probability density functions are 

listed below. 

1. fxy(x,y) > For all values of x and y or — oo < x < 

°° and — oo < y < °°, respectively 



*-{{: 



fxY(x.y) dxdy = 1 



3. F XY (x,y) 



w. 



fxY(x.y) dxdy 



4. Prob[x 1 <X<x 2 , yi<Y<y 2 ]: 



fy2 Tx2 

fxY(x,y) dxdy 

J y, J x, 



A.10 STATISTICAL INDEPENDENCE 

If the knowledge of one variable gives no information about 
the value of the other, the two random variables are said to 
be statistically independent. In terms of the joint pdf 

fxY(x.y) = f x M My) (A.10-1) 

and 

fx(x) f Y (y) = fxY(x.y) (A. 10-2) 

imply statistical independence of the random variables X 
and Y. In the same respect the joint cdf 

Fxy(x, y) = F x (x)F Y (y) (A.10-3) 

and 

F X (x)F Y (y) = F XY (x, y) (A. 10-4) 

again implies this independence. 

It is important to note that for the case of the expected 
value E[XY], statistical independence of random variables X 
and Y implies 

E[XY] = E[X] E[Y] (A. 10-5) 

but, the converse is not true since random variables can be 
uncorrelated but not necessarily independent. 
In summary 

1 . Fxy(x,y) = Fx(x) Fy(y) reversible 

2. fxy(x,y) = fx(x) fy(y) reversible 

3. E[XY] = E[X] E[Y] non-reversible 

A.1 1 MARGINAL DISTRIBUTION AND MARGINAL DEN- 
SITY FUNCTIONS 

When dealing with two or more random variables that are 
jointly distributed, the distribution of each random variable is 
called the marginal distribution. It can be shown that the 
marginal distribution defined in terms of a joint distribution 
can be manipulated to yield the distribution of each random 
variable considered by itself. Hence, the marginal distribu- 
tion functions Fx(x) and Fy(y) in terms of Fxy(x, y) are 

Fx(x) = F XY (x, °o) (A.1 1-1) 

and 



Fy(y) = F X Y(°°,y) 



(A.1 1-2) 



respectively. 
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The marginal density functions fx(x) and fy(x) in relation to 
the joint density fxy(x. y) is represented as 

fxM = J _ ^ fxv(x,y) dy (A 1 1 _ 3) 

and 

M*) = J fxy(x.y) dx (A 1 1 . 4) 

respectively. 

A.12 TERMINOLOGY 

Before continuing into the following sections it is appropri- 
ate to provide a few definitions of the terminology used 
hereafter. Admittedly, the definitions presented are by no 
means complete but are adequate and essential to the con- 
tinuity of the discussion. 

Deterministic and Nondeterministic Random Process- 
es: A random process for which its future values cannot be 
exactly predicted from the observed past values is said to 
be nondeterministic . A random process for which the future 
values of any sample function can be exactly predicted from 
a knowledge of all past values, however, is said to be a 
deterministic process. 

Stationary and Nonstationary Random Processes: If the 
marginal and joint density functions of an event do not de- 
pend upon the choice of i.e., time origin, the process is said 
to be stationary. This implies that the mean values and mo- 
ments of the process are constants and are not dependent 
upon the absolute value of time. If on the other hand the 
probability density functions do change with the choice of 
time origin, the process is defined nonstationary. For this 
case one or more of the mean values or moments are also 
time dependent. In the strictest sense, the stochastic pro- 
cess x(f) is stationary if its statistics are not affected by the 
shift in time origin i.e., the process x(f) and x(t + r) have the 
same statistics for any r. 

Ergodic and Nonergodic Random Processes: If every 
member of the ensemble in a stationary random process 
exhibits the same statistical behavior that the entire ensem- 
ble has, it is possible to determine the process statistical 
behavior by examining only one typical sample function. 
This is defined as an ergodic process and its mean value 
and moments can be determined by time averages as well 
as by ensemble averages. Further ergodicity implies a sta- 
tionary process and any process not possessing this prop- 
erty is nonergodic. 

Mathematically speaking, any random process or, i.e., wave 
shape x(t) for which 



lim 

T — 



,x(t) 



T- 



■tJ 
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-112 



x(t)dt = E[x(t)] 



holds true is said to be an ergodic process. This simply says 
that as the averaging time, T, is increased to the limit 
T — * », time averages equal ensemble averages (the ex- 
pected value of the function). 



A.13 JOINT MOMENTS 

In this section, the concept of joint statistics of two continu- 
ous dependent variables and a particular measure of this 
dependency will be discussed. 

The joint moments mjj of the two random variables X and Y 
are defined as 



mi, = E[XiYi] 



rs 



xiyi fxy(x,y) dx dy 



(A.13-1) 



where i + j is the order of the moment. 

The second moment represented as jui 1 or crxy serves as 
a measure of the dependence of two random variables and 
is given a special name called the covariance of X and Y. 
Thus 



P-11 



o-xy = E[(X - x) (Y - y)] 



//: 



(X-x)(Y-y)f XY (x,y)dxdy 



E[XY] - E[X] E[Y] 



(A. 13-2) 



(A. 13-3) 



= m-11 - xy (A.13-4) 

If the two random variables are independent, their covari- 
ance function mil is equal to zero and m-ii, the average of 
the product, becomes the product of the individual averages 
hence. 

fin = (A.13-5) 

implies 

inn = E[(X - x)(Y - y)] = E[X - x] E[Y - y] (A.13-6) 
Note, however, the converse of this statement in general is 
not true but does have validity for two random variables 
possessing a joint (two dimensional) Gaussian distribution. 
In some texts the name cross-covariance is used instead of 
covariance, regardless of the name used they both describe 
processes of two random variables each of which comes 
from a separate random source. If, however, the two ran- 
dom variables come from the same source it is instead 
called the autovariance or auto-covariance. 
It is now appropriate to define a normalized quantity called 
the correlation coefficient, p, which serves as a numerical 
measure of the dependence between two random variables. 
This coefficient is the covariance normalized, namely 



covar[X,Y] 
vvar[X] var[Y] 

°"X cry 



E([X-E[X]][Y-E[Y]] 



VO"x CTy 



{A.13-7) 



(A. 13-8) 
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where p is a dimensionless quantity 
-1 < p < 1 
Values close to 1 show high correlation of i.e., two random 
waveforms and those close to — 1 show high correlation of 
the same waveform except with opposite sign. Values near 
zero indicate low correlation. 

A.14 CORRELATION FUNCTIONS 

If x(t) is a sample function from a random process and the 
random variables 

xi = x(U) 

x 2 = x(t 2 ) 
are from this process, then, the autocorrelation function 
R(t|, t 2 ) is the joint moment of the two random variables; 

Rxx(tl,t 2 ) = EMt^xfla)] (A.14-1) 

= J j _ ^ xi x 2 fx-i x 2 (xi .><2) d x1 d X2 
where the autocorrelation is a function of ti and t 2 . 
The auto-covariance of the process x(t) is the covariance of 
the random variables x(t-|) x(t 2 ) 

Cxx(tl,t 2 ) = Eflxft) - xWl Mt 2 ) - xfej]) (A. 14-2) 
or rearranging equation (A.14-1) 
cft.y = EfMt,) - xfti)] [x(t 2 ) - xfta)]) 

= Efx^xfe) - xft^xOa) - x(t|)x(t 2 ) 



= E(x(t!) x(t 2 ) - xft) E[x(t 2 )] - Elxft)] x(t 2 ) 
+ EMt^lEMtg)]! 

= E[x(ti)x(t2)] - E[x(t!)] E[x(t 2 )] 
- EIx^)] E[x(t 2 )] + Elxfr)] E[x(t 2 )] 

= E [x(t 1 ) x(t 2 )] - E [x(t! )] E [x(t 2 )] (A. 1 4-3) 

or 

= Rft, t 2 ) - Etxft)] E[x(t 2 )] (A.14-4) 

The autocorrelation function as defined in equation (A.14-1) 
is valid for both stationary and nonstationary processes. If 
x(t) is stationary then all its ensemble averages are indepen- 
dent of the time origin and accordingly 

Rxx('l.t 2 ) = Rxx(*1 +T, t 2 + T) (A.14-5a) 

= E[x(t! + T), x(t 2 + T)] (A.14-5b) 



Due to this time origin independence, T can be set equal to 
— 1|, T = — 1|, and substitution into equations (A.14-5a, b) 
Rxx(tl.t 2 ) = Rxx(0,t 2 - h) (A.14-6a) 

= E[x(0)x(t 2 -t-|)] (A.14-6b) 

imply that the expression is only dependent upon the time 
difference t 2 — t|. Replacing the difference with t = t 2 — t-| 
and suppressing the zero in the argument Rxx( n . '2 — ti) 
yields 

R xx (r) = E[x(t 1 )x(t 1 -t)] (A. 14-7) 

Again since this is a stationary process it depends only on r. 
The lack of dependence on the particular time, t|, at which 
the ensemble was taken allows equation (A.14-7) to be writ- 
ten without the subscript, i.e., 

R xx (r) = E[x(t)x(t + T)] (A. 14-8) 

as it is found in many texts. This is the expression for the 
autocorrelation function of a stationary random process. 
For the autocorrelation function of a nonstationary process 
where there is a dependence upon the particular time at 
which the ensemble average was taken as well as on the 
time difference between samples, the expression must be 
written with identifying subscripts, i.e., R xx (ti, t 2 ) or 
Rxx(ti, r). 

The time autocorrelation function can be defined next and 
has the form 

1 fT/2 



flxxM 



lim 
t— » 



x(t)x(t + T)dt (A. 14-9) 



>Tj -T/2 

For the special case of an ergodic process (Ref. Appendix 
A.1 2) the two functions, equations (A.14-8) and (A.14-9), are 
equal 

/?xxM = RxxM (A.1 4-10) 

It is important to point out that if t = in equation (A.14-7) 
the autocorrelation function 

R xx (0) = E[x(t 1 )x(t 1 )] (A.14-1 1) 

would equal the mean square value or total power (AC plus 
DC) of the process. Further, for values other than t = 0, 
R x (r) represents a measure of the similarity between its 
waveforms x(t) and x(t + t). 
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In the same respect as the previous discussion, two random 
variables from two different jointly stationary random pro- 
cesses x(t) and y(t) have for the random variables 

*1 = x(ti) 
y 2 = y(ti + t) 
the crosscorrelation function 



APPENDIX B 

B.O INTERCHANGING TIME INTEGRATION 
AND EXPECTATIONS 

If f(t) is a nonrandom time function and a(t) a sample func- 
tion from a random process then, 

(B.0-1) 



R xy (r) = E(x(t 1 )y(t 1 + T)] (A. 14-1 2) 

= J J _ x xiy2 fxiy 2 ( x l.y2) dxi dy 2 

The crosscorrelation function is simply a measure of how 
much these two variables depend upon one another. 
Since it was assumed that both random processes were 
jointly stationary, the crosscorrelation is thus only depen- 
dent upon the time difference r and, therefore 

Rxy(i-) = RyxM (A.14-13) 

where 

yi = y(ti) 

x 2 = x(ti + t) 

and 



ft 2 ft 2 

a(t)f(t)dt = 

J t-! J t-. 



" E[a(t)] f(t) dt 



This is true under the condition 

a) [ 2 E[|a(t)|] |f(t)| dt < °o 
J t, 



R yx (r) = Efy^Jx^+r)] 



(A.14-14) 



n: 



yix 2 fy 1 x 2 (yi,x 2 )dy 1 dx 2 
The time crosscorrelation functions are defined as before by 



(B.0-2) 

b) a(t) is bounded by the interval ti to t 2 . [tiand t 2 may be 
infinite and a(t) may be either stationary or nonstation- 
ary] 

APPENDIX C 

CO CONVOLUTION 

This appendix defines convolution and presents a short 
proof without elaborate explanation. For complete definition 
of convolution refer to National Semiconductor Application 
Note AN-237. 
For the time convolution if 

f(t) <— >• F(«) (C.0-1) 

x(t) <— >• X(<o) (C.0-2) 

then 



flxyW = ^ oo \ j 1 '_ 2 / *« y(* + T ) dt 

and 


(A.14-15) 


y(t)= f " x(r)f(t 

J -oo 




r) dr <-^- Y(«) = X 


O)) 


(C.0-3) 


lim 1 f T/2 

flyxM-l'^ooyJ _ T/2 y(t)x(t + T)dt 


(A.14-16) 


or 
y(t) = x(t) * f(t) <-^- Y(a>) = X(«) * F(co) (C.0-4) 


and finally 




proof: 


flxyW = Rxy(i-) 


(A.14-17) 


Taking the Fourier transform, F[ ] , of y(t) 


Ryx(T) = Ryx(T) 


(A.14-18) 


r i (C.0-5) 


for the case of jointly ergodic random processes 




F[y(t)] = Y(co)= [' 

J -oo 


x(t) f(t - t) dr 


£-i»tdt 






Y<a>) = f °° x(r) 

J -oo 


i f(t-T)-i«tdt 

J -oo 


(C.0-6) 






dr 



and letting k = t — t, then, dk = dt and t = k + t. 
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Thus, 






Y(o>) = J" °°^ X(T) 


I f(k)«-io>(k+ r)dk 

J -oo 


dT (C.0-7) 


Too 
= X(T)£-|WTd T 


i " f(k)e 

J -oo 


-jwkdk 


(C.0-8) 



Y(o>) = X(o>) • F(oi) 






(C.0-9) 


For the frequency convolution of 








f(t) <— ► F(o>) 






(C.0-10) 


x(t) <-» X(o>) 






(C.0-11) 


then 








H(o>) = — F(v) X(o> - v) dv 

27T J — oo 


*-»■ 


h(t) = 


= f(t) • x(t) 
(C.0-12) 



H(o>) = — F(e>) * X(o>) *—*■ h(t) = f(t) • x(t) (C.0-1 3) 

27T 

proof: 

Taking the inverse Fourier transform F _1 [ ] of equation 

(C.0-1 3) 



h(t) 



(C.0-1 4) 



i/)(o) — v) dv d<°t da 



F _ 1 r X (o)) • F( M ) i 

-J-f- 

2ttJ -oo 

[ill- 

= ( — J I ' F(v) J ' X(o - v) do>* do> dv 

(C.0-1 5) 

and letting g = a> — v, then dg = do) and a> = g + v. 
Thus, 

r _ 1 X(qi)*F(ai) 
2tt 

(C.0-1 6) 

(i^) 2 j ^00 F(V) j ^ X(9) d<9 + ^ d9 dV 

1 r oo f» 

— F(v)eMdv« X(g)eigtdg (C.0-1 7) 

'.IT J - oo J - oo 

(C.0-1 8) 



h(t) 

h(t) = f(t).x(t) 
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